Skip to content

fix(s2): enforce 1 rps throttling across S2 stages#5

Closed
spignotti wants to merge 7 commits intomainfrom
feat/v0.1.0-polish
Closed

fix(s2): enforce 1 rps throttling across S2 stages#5
spignotti wants to merge 7 commits intomainfrom
feat/v0.1.0-polish

Conversation

@spignotti
Copy link
Copy Markdown
Owner

Summary

  • add s2_requests_per_second setting (default 1.0) to match Semantic Scholar key limits
  • throttle outbound S2 calls in discovery and enrichment stages
  • update litresearch.toml.example with S2 timeout + rate settings
  • add discovery-stage test to verify throttling wait behavior

Why

Your approved Semantic Scholar key is limited to 1 request/second cumulative. This change makes that limit explicit and enforced by default.

Validation

  • uv run nox -s lint typecheck test
  • all sessions passed (24 tests)

- Guard json.loads() in analysis.py with try/except JSONDecodeError
- Add s2_timeout config setting (default 10s) with retry=False for S2 client
- Prevent PDF double-download by saving during analysis and marking pdf_downloaded
- Skip already-downloaded PDFs in export stage
- Refactor _build_settings to use immutable Settings(**overrides) pattern
- Add --overwrite flag to run command
- Auto-increment output directory name when directory exists and is populated
- Add tests for collision detection and overwrite behavior
- Write ScreeningResult with score=0 for papers without abstract
- Wrap call_llm in try/except LLMError in query_gen with clear error message
- Rename litresearch.toml to litresearch.toml.example (git mv)
- Add html.unescape() for title, abstract, venue in Paper.from_s2()
- Test query generation with successful LLM response and error handling
- Test screening behavior for no-abstract papers and JSON parse failures
- Test discovery S2 client configuration and paper deduplication
- Add comment for BATCH_SIZE in enrichment.py
- Add run summary block in pipeline.py with timing and counts
- Change screening_threshold default from 40 to 60 with documentation
Add configurable s2_requests_per_second (default 1.0) and throttle
requests in discovery and enrichment to respect Semantic Scholar
rate limits. Update example config and add discovery rate-limit test.
@spignotti
Copy link
Copy Markdown
Owner Author

Closing in favor of #6. This branch was based on pre-squash history and showed conflicts after #4 was squash-merged.

@spignotti spignotti closed this Mar 23, 2026
@spignotti spignotti deleted the feat/v0.1.0-polish branch April 10, 2026 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant